Estimating the size and evolution of categorised topics in web directories

نویسندگان

  • Ioannis Anagnostopoulos
  • Christos-Nikolaos Anagnostopoulos
چکیده

In this paper a statistical approach for estimating the evolution of categorized web page populations in web directories is proposed. The proposal is based on the capture-recapture method used in wildlife biological studies and it is modified according to the necessary assumptions and amendments for conducting the experiments on the web. During these experiments, web pages are likened to animals and the specific categories of web pages are likened to particular species of animals whose abundance, birth and survival rates are estimated. The capture-recapture model followed is a model that allows us to consider the populations under study as open. Thus, in the course of time the population evolves, meaning that new web pages are inserted in the study, while others are removed or become inactive, resembling the natural processes of migration or death. Artificial intelligence classifiers, capable of categorizing web pages, play the role of the biologists who recognize the species under study. In our work, four different simulations were conducted in order to evaluate the robustness of the model followed on the web paradigm, based on four different real classification cases. The paper provides the implementation details of our proposed web-based capture-recapture model, along with its initial assessment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of a Web directory for Medical Education (WDME): a Tool to Facilitate Research in Medical Education

Introduction: Access to the medical education resources on the web is one of current challenges for researchers and medical science educators. The purpose of current project was to design and implement a comprehensive and specific subject/web directory of medical education. Methods: First, the categories to be incorporated in the directory were defined through reviewing related directories an...

متن کامل

An Ontology-Based Model for the Dynamic Population of Web Directories

aBstraCt In this chapter we study how we can organize the continuously proliferating Web content into topical categories, also known as Web directories. In this respect, we have implemented a system, named TODE that uses a Topical Ontology for Directories' Editing. First, we describe the process for building our ontology of Web topics, which are treated in TODE as directories' topics. Then, we ...

متن کامل

TODE: An Ontology-Based Model for the Dynamic Population of Web Directories Authors

In this paper we study how we can organize the continuously proliferating Web content into topical categories, also known as Web directories. In this respect, we have implemented a system, named TODE that uses a Topical Ontology for Directories' Editing. First, we describe the process for building our ontology of Web topics, which are treated in TODE as directories' topics. Then, we present how...

متن کامل

Estimating evolution of freshness in Internet cache directories under the capture-recapture methodology

1 Abstract— In this paper, we describe a new web sampling schema for measuring the evolution of freshness in search engines. The methodology used is the capture-recapture, which is mainly applied for estimating evolution rates in wildlife biological studies. After modifications and amendments necessary for web paradigm application, we conducted three capture-recapture experiments of different d...

متن کامل

A Comparative Study of Performance of Adaptive Web Sampling and General Inverse Adaptive Sampling in Estimating Olive Production in Iran

Nowadays, there is an increasing use of sampling methods in network and spatial populations. Although the most common link-tracing designs such as adaptive cluster sampling and snowball sampling have advantages over conventional sampling designs such as simple random sampling and cluster sampling, these designs still present many drawbacks. Adaptive web sampling is a new link-tracing design tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Web Intelligence and Agent Systems

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2010